100 research outputs found
CoPhy: A Scalable, Portable, and Interactive Index Advisor for Large Workloads
Index tuning, i.e., selecting the indexes appropriate for a workload, is a
crucial problem in database system tuning. In this paper, we solve index tuning
for large problem instances that are common in practice, e.g., thousands of
queries in the workload, thousands of candidate indexes and several hard and
soft constraints. Our work is the first to reveal that the index tuning problem
has a well structured space of solutions, and this space can be explored
efficiently with well known techniques from linear optimization. Experimental
results demonstrate that our approach outperforms state-of-the-art commercial
and research techniques by a significant margin (up to an order of magnitude).Comment: VLDB201
SeeDB: automatically generating query visualizations
Data analysts operating on large volumes of data often rely on visualizations to interpret the results of queries. However, finding the right visualization for a query is a laborious and time-consuming task. We demonstrate SeeDB, a system that partially automates this task: given a query, SeeDB explores the space of all possible visualizations, and automatically identifies and recommends to the analyst those visualizations it finds to be most "interesting" or "useful". In our demonstration, conference attendees will see SeeDB in action for a variety of queries on multiple real-world datasets
Web information management with access control
International audienceWe investigate the problem of sharing private information on the Web, where the information is hosted on diïŹerent machines that may use diïŹerent access control and distribution schemes. We introduce a distributed knowledge-base model, termed WebdamExchange, that comprises logical statements for specifying data, access control, distribution and knowledge about other peers. The statements can be communicated, replicated, queried, and updated, while keeping track of time and provenance. This uniïŹed base allows applications to reason declaratively about what data is accessible, where it resides, and how to retrieve it securely
Towards a Workload for Evolutionary Analytics
Emerging data analysis involves the ingestion and exploration of new data
sets, application of complex functions, and frequent query revisions based on
observing prior query answers. We call this new type of analysis evolutionary
analytics and identify its properties. This type of analysis is not well
represented by current benchmark workloads. In this paper, we present a
workload and identify several metrics to test system support for evolutionary
analytics. Along with our metrics, we present methodologies for running the
workload that capture this analytical scenario.Comment: 10 page
Predictable performance and high query concurrency for data analytics
Conventional data warehouses employ the query- at-a-time model, which maps each query to a distinct physical plan. When several queries execute concurrently, this model introduces contention and thrashing, because the physical plansâunaware of each otherâcompete for access to the underlying I/O and computation resources. As a result, while modern systems can efficiently optimize and evaluate a single complex data analysis query, their performance suffers significantly and can be highly erratic when multiple complex queries run at the same time. We present in this paper Cjoin , a new design that substantially improves throughput in large-scale data analytics systems processing many concurrent join queries. In contrast to the conventional query-at-a-time model, our approach employs a single physical plan that shares I/O, computation, and tuple storage across all in-flight join queries. We use an âalways onâ pipeline of non-blocking operators, managed by a controller that continuously examines the current query mix and optimizes the pipeline on the fly. Our design enables data analytics engines to scale gracefully to large data sets, provide predictable execution times, and reduce contention. We implemented Cjoin as an extension to the PostgreSQL DBMS. This prototype outperforms conventional commercial systems by an order of magnitude for tens to hundreds of concurrent queries
- âŠ